Task 1.¶
PD¶
\begin{gather*} f( x_{1} ,x_{2}) =( x_{1} +x_{2})^{2}\\ g_{PD}^{1}( z) =E_{X_{2}}( z+x_{2})^{2} =z^{2} +2zE_{X_{2}} x_{2} +( E_{X_{2}} x_{2})^{2} =z^{2} \end{gather*}
(In PD we calculate $\displaystyle E_{X_{2}}$ independently of $\displaystyle z$)
MP¶
\begin{array}{l} g_{1}^{MP}( z) =E_{x_{_{2}} |x_{_{1}} =z}( x_{1} +x_{2})^{2} =\\ 2z^{2} \end{array}
ALE¶
\begin{array}{l} g_{1}^{AL}( z) =\int _{-1}^{z} E_{x_{_{2}} |x_{_{1}} =v}\frac{\partial ( x_{1} +x_{2})^{2}}{\partial x_{1}} dv=\\ \int _{-1}^{z} E_{x_{_{2}} |x_{_{1}} =v} 2( x_{1} +x_{2}) dv=\\ \int _{-1}^{z} 4vdv=\\ \left[ 2v^{2}\right]_{-1}^{z} =\\ 2z^{2} +2 \end{array}
Task 2.¶
Like in the previous assignment, we will be working with the "MagicTelescope" dataset. This dataset is designed to mimic the detection of high-energy gamma particles. The 'TARGET' column in this dataset contains two types of values:
- 'g' represents gamma (signal), which we will label as 0,
- 'h' stands for hadron (background), which we will assign the label 1.
We'll first use the XGBoost model, and later the CatBoost model for comparison.
Let me remind you that the dataset looks like this:
We'll train the XGBClassifier model with the default parameters.
We can check its accuracy on the test set:
For the rest of the task, we'll mainly focus on a randomly selected 100-element subset of the test set.
Let's create our own function to calculate Ceteris Paribus explanations.
For each row in the first 10 rows of the sampled set, we'll plot the Ceteris Paribus profile for the feature fDist:. The red dots represent the original values of the feature, while the corresponding line represents the "what-if" scenarios for a given value of fDist:.
We can see for example that observations 2503 and 13317 have a different profile for this feature. Let's plot just the two of them below.
We can see there's a lot of correlation between fDist: and other features:
There's also a lot of correlation between features in general, as seen on this correlation heatmap:
I've tried to see what happens when we set the values at columns with high correlation (I set the threshold at absolute value $\geq0.4$) to the mean value from the entire dataset.
I was suprised by how flat was the line for the observation 13317. It probably means that the main variability in prediction to fDist: for that observation was not due to fDist: itself, but rather due to features fLength: and fSize: that are highly correlated.
Let's now train the CatBoost model.
We can now compare the PDP profiles for the XGBoost and Catboost models (also with custom built function).
As we can see, they are incredibly similar. They also share many similarities to
The Catboost plot is also much more smooth, probably due to the higher tree depth I set for the Catboost model. This way there's many smaller "jumps" in the Catboost PDP plot, compared to less bigger "jumps" in XGBoost plot, which creates the illusion of a smooth curve.